42 research outputs found
Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext
We consider the problem of learning general-purpose, paraphrastic sentence
embeddings in the setting of Wieting et al. (2016b). We use neural machine
translation to generate sentential paraphrases via back-translation of
bilingual sentence pairs. We evaluate the paraphrase pairs by their ability to
serve as training data for learning paraphrastic sentence embeddings. We find
that the data quality is stronger than prior work based on bitext and on par
with manually-written English paraphrase pairs, with the advantage that our
approach can scale up to generate large training sets for many languages and
domains. We experiment with several language pairs and data sources, and
develop a variety of data filtering techniques. In the process, we explore how
neural machine translation output differs from human-written sentences, finding
clear differences in length, the amount of repetition, and the use of rare
words
Zero-Shot Crosslingual Sentence Simplification
Sentence simplification aims to make sentences easier to read and understand. Recent approaches have shown promising results with encoder-decoder models trained on large amounts of parallel data which often only exists in English. We propose a zero-shot modeling framework which transfers simplification knowledge from English to another language (for which no parallel simplification corpus exists) while generalizing across languages and tasks. A shared transformer encoder constructs language-agnostic representations, with a combination of task-specific encoder layers added on top (e.g., for translation and simplification). Empirical results using both human and automatic metrics show that our approach produces better simplifications than unsupervised and pivot-based methods
Universal rewriting via machine translation
Natural language allows for the same meaning (semantics) to be expressed in multiple different ways, i.e. paraphrasing. This thesis examines automatic approaches for paraphrasing, focusing on three paraphrasing subtasks: unconstrained paraphrasing where there are no constraints on the output, simplification, where the output must be simpler than the input, and text compression where the output must be shorter than the input.
Whilst we can learn paraphrasing from supervised data, this data is sparse and expensive to create. This thesis is concerned with the use of transfer learning to improve paraphrasing when there is no supervised data. In particular, we address the following question: can transfer learning be used to overcome a lack of paraphrasing data? To answer this question we split it into three subquestions (1) No supervised data exists for a specific paraphrasing task; can bilingual data be used as a source of training data for paraphrasing? (2) Supervised paraphrasing data exists in one language but not in another; can bilingual data be used to transfer paraphrasing training data from one language to another? (3) Can the output of encoder-decoder paraphrasing models be controlled
Teaching Small Language Models to Reason
Chain of thought prompting successfully improves the reasoning capabilities
of large language models, achieving state of the art results on a range of
datasets. However, these reasoning capabilities only appear to emerge in models
with a size of over 100 billion parameters. In this paper, we explore the
transfer of such reasoning capabilities to models with less than 100 billion
parameters via knowledge distillation. Specifically, we finetune a student
model on the chain of thought outputs generated by a larger teacher model. Our
experiments show that the proposed method improves task performance across
arithmetic, commonsense and symbolic reasoning datasets. For example, the
accuracy of T5 XXL on GSM8K improves from 8.11% to 21.99% when finetuned on
PaLM-540B generated chains of thought
Small Language Models Improve Giants by Rewriting Their Outputs
Large language models (LLMs) have demonstrated impressive few-shot learning
capabilities, but they often underperform compared to fine-tuned models on
challenging tasks. Furthermore, their large size and restricted access only
through APIs make task-specific fine-tuning impractical. Moreover, LLMs are
sensitive to different aspects of prompts (e.g., the selection and order of
demonstrations) and can thus require time-consuming prompt engineering. In this
light, we propose a method to correct LLM outputs without relying on their
weights. First, we generate a pool of candidates by few-shot prompting an LLM.
Second, we refine the LLM-generated outputs using a smaller model, the
LM-corrector (LMCor), which is trained to rank, combine and rewrite the
candidates to produce the final target output. Our experiments demonstrate that
even a small LMCor model (250M) substantially improves the few-shot performance
of LLMs (62B) across diverse tasks. Moreover, we illustrate that the LMCor
exhibits robustness against different prompts, thereby minimizing the need for
extensive prompt engineering. Finally, we showcase that the LMCor can be
seamlessly integrated with different LLMs at inference time, serving as a
plug-and-play module to improve their performance
Opsoclonus-Myoclonus Presenting With Features of Spasmus Nutans
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/66540/2/10.1177_088307389501000117.pd